Dataset statistics
| Number of variables | 18 |
|---|---|
| Number of observations | 40690 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 5.6 MiB |
| Average record size in memory | 144.0 B |
Variable types
| NUM | 8 |
|---|---|
| CAT | 6 |
| BOOL | 4 |
Reproduction
| Analysis started | 2020-06-26 15:35:29.775553 |
|---|---|
| Analysis finished | 2020-06-26 15:35:42.172066 |
| Duration | 12.4 seconds |
| Version | pandas-profiling v2.8.0 |
| Command line | pandas_profiling --config_file config.yaml [YOUR_FILE.csv] |
| Download configuration | config.yaml |
| Distinct count | 40690 |
|---|---|
| Unique (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 22593.702924551486 |
|---|---|
| Minimum | 0 |
| Maximum | 45210 |
| Zeros | 1 |
| Zeros (%) | < 0.1% |
| Memory size | 317.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 2266.45 |
| Q1 | 11257.5 |
| median | 22562.5 |
| Q3 | 33929.75 |
| 95-th percentile | 42961.55 |
| Maximum | 45210 |
| Range | 45210 |
| Interquartile range (IQR) | 22672.25 |
Descriptive statistics
| Standard deviation | 13064.34245 |
|---|---|
| Coefficient of variation (CV) | 0.5782293631 |
| Kurtosis | -1.204288374 |
| Mean | 22593.70292 |
| Median Absolute Deviation (MAD) | 11337.5 |
| Skewness | 0.003007341099 |
| Sum | 919337772 |
| Variance | 170677043.7 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 2047 | 1 | < 0.1% | |
| 3339 | 1 | < 0.1% | |
| 34074 | 1 | < 0.1% | |
| 40217 | 1 | < 0.1% | |
| 38168 | 1 | < 0.1% | |
| 11535 | 1 | < 0.1% | |
| 9486 | 1 | < 0.1% | |
| 15629 | 1 | < 0.1% | |
| 13580 | 1 | < 0.1% | |
| 1290 | 1 | < 0.1% | |
| Other values (40680) | 40680 | > 99.9% |
| Value | Count | Frequency (%) | |
| 0 | 1 | < 0.1% | |
| 1 | 1 | < 0.1% | |
| 2 | 1 | < 0.1% | |
| 3 | 1 | < 0.1% | |
| 4 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 45210 | 1 | < 0.1% | |
| 45208 | 1 | < 0.1% | |
| 45207 | 1 | < 0.1% | |
| 45206 | 1 | < 0.1% | |
| 45205 | 1 | < 0.1% |
age
Real number (ℝ≥0)
| Distinct count | 77 |
|---|---|
| Unique (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 40.90540673384124 |
|---|---|
| Minimum | 18 |
| Maximum | 95 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 317.9 KiB |
Quantile statistics
| Minimum | 18 |
|---|---|
| 5-th percentile | 27 |
| Q1 | 33 |
| median | 39 |
| Q3 | 48 |
| 95-th percentile | 59 |
| Maximum | 95 |
| Range | 77 |
| Interquartile range (IQR) | 15 |
Descriptive statistics
| Standard deviation | 10.60490825 |
|---|---|
| Coefficient of variation (CV) | 0.2592544384 |
| Kurtosis | 0.3231613161 |
| Mean | 40.90540673 |
| Median Absolute Deviation (MAD) | 7 |
| Skewness | 0.6834969594 |
| Sum | 1664441 |
| Variance | 112.464079 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 32 | 1869 | 4.6% | |
| 31 | 1796 | 4.4% | |
| 33 | 1775 | 4.4% | |
| 34 | 1725 | 4.2% | |
| 35 | 1721 | 4.2% | |
| 36 | 1609 | 4.0% | |
| 30 | 1574 | 3.9% | |
| 37 | 1502 | 3.7% | |
| 39 | 1332 | 3.3% | |
| 38 | 1330 | 3.3% | |
| Other values (67) | 24457 | 60.1% |
| Value | Count | Frequency (%) | |
| 18 | 12 | < 0.1% | |
| 19 | 30 | 0.1% | |
| 20 | 45 | 0.1% | |
| 21 | 70 | 0.2% | |
| 22 | 121 | 0.3% |
| Value | Count | Frequency (%) | |
| 95 | 2 | < 0.1% | |
| 94 | 1 | < 0.1% | |
| 93 | 2 | < 0.1% | |
| 92 | 2 | < 0.1% | |
| 90 | 2 | < 0.1% |
job
Categorical
| Distinct count | 12 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 317.9 KiB |
| blue-collar | |
|---|---|
| management | |
| technician | |
| admin. | |
| services | |
| Other values (7) |
| Value | Count | Frequency (%) | |
| blue-collar | 8769 | 21.6% | |
| management | 8504 | 20.9% | |
| technician | 6818 | 16.8% | |
| admin. | 4661 | 11.5% | |
| services | 3725 | 9.2% | |
| retired | 2027 | 5.0% | |
| self-employed | 1427 | 3.5% | |
| entrepreneur | 1339 | 3.3% | |
| unemployed | 1193 | 2.9% | |
| housemaid | 1125 | 2.8% | |
| Other values (2) | 1102 | 2.7% |
Length
| Max length | 13 |
|---|---|
| Median length | 10 |
| Mean length | 9.486900958 |
| Min length | 6 |
marital
Categorical
| Distinct count | 3 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 317.9 KiB |
| married | |
|---|---|
| single | |
| divorced | 4695 |
| Value | Count | Frequency (%) | |
| married | 24464 | 60.1% | |
| single | 11531 | 28.3% | |
| divorced | 4695 | 11.5% |
Length
| Max length | 8 |
|---|---|
| Median length | 7 |
| Mean length | 6.831998034 |
| Min length | 6 |
education
Categorical
| Distinct count | 4 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 317.9 KiB |
| secondary | |
|---|---|
| tertiary | |
| primary | |
| unknown | 1669 |
| Value | Count | Frequency (%) | |
| secondary | 20951 | 51.5% | |
| tertiary | 11917 | 29.3% | |
| primary | 6153 | 15.1% | |
| unknown | 1669 | 4.1% |
Length
| Max length | 9 |
|---|---|
| Median length | 9 |
| Mean length | 8.32265913 |
| Min length | 7 |
default
Boolean
| Distinct count | 2 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 317.9 KiB |
| no | |
|---|---|
| yes | 725 |
| Value | Count | Frequency (%) | |
| no | 39965 | 98.2% | |
| yes | 725 | 1.8% |
| Distinct count | 6903 |
|---|---|
| Unique (%) | 17.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1359.6975178176456 |
|---|---|
| Minimum | -8019 |
| Maximum | 102127 |
| Zeros | 3139 |
| Zeros (%) | 7.7% |
| Memory size | 317.9 KiB |
Quantile statistics
| Minimum | -8019 |
|---|---|
| 5-th percentile | -173 |
| Q1 | 74 |
| median | 451 |
| Q3 | 1423 |
| 95-th percentile | 5745.55 |
| Maximum | 102127 |
| Range | 110146 |
| Interquartile range (IQR) | 1349 |
Descriptive statistics
| Standard deviation | 3034.248783 |
|---|---|
| Coefficient of variation (CV) | 2.231561611 |
| Kurtosis | 142.8032515 |
| Mean | 1359.697518 |
| Median Absolute Deviation (MAD) | 451 |
| Skewness | 8.410197358 |
| Sum | 55326092 |
| Variance | 9206665.678 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 3139 | 7.7% | |
| 1 | 169 | 0.4% | |
| 2 | 147 | 0.4% | |
| 4 | 123 | 0.3% | |
| 3 | 121 | 0.3% | |
| 5 | 102 | 0.3% | |
| 6 | 79 | 0.2% | |
| 8 | 74 | 0.2% | |
| 23 | 68 | 0.2% | |
| 7 | 64 | 0.2% | |
| Other values (6893) | 36604 | 90.0% |
| Value | Count | Frequency (%) | |
| -8019 | 1 | < 0.1% | |
| -6847 | 1 | < 0.1% | |
| -4057 | 1 | < 0.1% | |
| -3372 | 1 | < 0.1% | |
| -3313 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 102127 | 1 | < 0.1% | |
| 98417 | 1 | < 0.1% | |
| 81204 | 1 | < 0.1% | |
| 71188 | 1 | < 0.1% | |
| 66721 | 1 | < 0.1% |
housing
Boolean
| Distinct count | 2 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 317.9 KiB |
| yes | |
|---|---|
| no |
| Value | Count | Frequency (%) | |
| yes | 22661 | 55.7% | |
| no | 18029 | 44.3% |
loan
Boolean
| Distinct count | 2 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 317.9 KiB |
| no | |
|---|---|
| yes | 6513 |
| Value | Count | Frequency (%) | |
| no | 34177 | 84.0% | |
| yes | 6513 | 16.0% |
contact
Categorical
| Distinct count | 3 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 317.9 KiB |
| cellular | |
|---|---|
| unknown | |
| telephone | 2600 |
| Value | Count | Frequency (%) | |
| cellular | 26319 | 64.7% | |
| unknown | 11771 | 28.9% | |
| telephone | 2600 | 6.4% |
Length
| Max length | 9 |
|---|---|
| Median length | 8 |
| Mean length | 7.774612927 |
| Min length | 7 |
day
Real number (ℝ≥0)
| Distinct count | 31 |
|---|---|
| Unique (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 15.808405013516834 |
|---|---|
| Minimum | 1 |
| Maximum | 31 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 317.9 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 3 |
| Q1 | 8 |
| median | 16 |
| Q3 | 21 |
| 95-th percentile | 29 |
| Maximum | 31 |
| Range | 30 |
| Interquartile range (IQR) | 13 |
Descriptive statistics
| Standard deviation | 8.318280773 |
|---|---|
| Coefficient of variation (CV) | 0.5261935512 |
| Kurtosis | -1.058766298 |
| Mean | 15.80840501 |
| Median Absolute Deviation (MAD) | 7 |
| Skewness | 0.0932807123 |
| Sum | 643244 |
| Variance | 69.19379502 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 20 | 2467 | 6.1% | |
| 18 | 2090 | 5.1% | |
| 21 | 1820 | 4.5% | |
| 17 | 1764 | 4.3% | |
| 6 | 1738 | 4.3% | |
| 5 | 1728 | 4.2% | |
| 14 | 1646 | 4.0% | |
| 8 | 1644 | 4.0% | |
| 28 | 1643 | 4.0% | |
| 7 | 1636 | 4.0% | |
| Other values (21) | 22514 | 55.3% |
| Value | Count | Frequency (%) | |
| 1 | 287 | 0.7% | |
| 2 | 1151 | 2.8% | |
| 3 | 975 | 2.4% | |
| 4 | 1300 | 3.2% | |
| 5 | 1728 | 4.2% |
| Value | Count | Frequency (%) | |
| 31 | 583 | 1.4% | |
| 30 | 1407 | 3.5% | |
| 29 | 1566 | 3.8% | |
| 28 | 1643 | 4.0% | |
| 27 | 1004 | 2.5% |
month
Categorical
| Distinct count | 12 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 317.9 KiB |
| may | |
|---|---|
| jul | |
| aug | |
| jun | |
| nov | |
| Other values (7) |
| Value | Count | Frequency (%) | |
| may | 12413 | 30.5% | |
| jul | 6214 | 15.3% | |
| aug | 5606 | 13.8% | |
| jun | 4848 | 11.9% | |
| nov | 3535 | 8.7% | |
| apr | 2646 | 6.5% | |
| feb | 2363 | 5.8% | |
| jan | 1268 | 3.1% | |
| oct | 660 | 1.6% | |
| sep | 520 | 1.3% | |
| Other values (2) | 617 | 1.5% |
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
duration
Real number (ℝ≥0)
| Distinct count | 1530 |
|---|---|
| Unique (%) | 3.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 258.2438436962399 |
|---|---|
| Minimum | 0 |
| Maximum | 4918 |
| Zeros | 3 |
| Zeros (%) | < 0.1% |
| Memory size | 317.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 35 |
| Q1 | 103 |
| median | 180 |
| Q3 | 319 |
| 95-th percentile | 752.55 |
| Maximum | 4918 |
| Range | 4918 |
| Interquartile range (IQR) | 216 |
Descriptive statistics
| Standard deviation | 257.5770676 |
|---|---|
| Coefficient of variation (CV) | 0.9974180368 |
| Kurtosis | 18.14829807 |
| Mean | 258.2438437 |
| Median Absolute Deviation (MAD) | 93 |
| Skewness | 3.138967323 |
| Sum | 10507942 |
| Variance | 66345.94575 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 124 | 169 | 0.4% | |
| 90 | 165 | 0.4% | |
| 89 | 160 | 0.4% | |
| 136 | 159 | 0.4% | |
| 114 | 158 | 0.4% | |
| 122 | 158 | 0.4% | |
| 139 | 158 | 0.4% | |
| 112 | 158 | 0.4% | |
| 104 | 157 | 0.4% | |
| 113 | 156 | 0.4% | |
| Other values (1520) | 39092 | 96.1% |
| Value | Count | Frequency (%) | |
| 0 | 3 | < 0.1% | |
| 1 | 2 | < 0.1% | |
| 2 | 3 | < 0.1% | |
| 3 | 4 | < 0.1% | |
| 4 | 14 | < 0.1% |
| Value | Count | Frequency (%) | |
| 4918 | 1 | < 0.1% | |
| 3881 | 1 | < 0.1% | |
| 3785 | 1 | < 0.1% | |
| 3366 | 1 | < 0.1% | |
| 3322 | 1 | < 0.1% |
campaign
Real number (ℝ≥0)
| Distinct count | 47 |
|---|---|
| Unique (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.764585893339887 |
|---|---|
| Minimum | 1 |
| Maximum | 63 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 317.9 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 2 |
| Q3 | 3 |
| 95-th percentile | 8 |
| Maximum | 63 |
| Range | 62 |
| Interquartile range (IQR) | 2 |
Descriptive statistics
| Standard deviation | 3.110157616 |
|---|---|
| Coefficient of variation (CV) | 1.124999452 |
| Kurtosis | 39.85929663 |
| Mean | 2.764585893 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 4.931344977 |
| Sum | 112491 |
| Variance | 9.673080397 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 1 | 15817 | 38.9% | |
| 2 | 11218 | 27.6% | |
| 3 | 5003 | 12.3% | |
| 4 | 3161 | 7.8% | |
| 5 | 1589 | 3.9% | |
| 6 | 1140 | 2.8% | |
| 7 | 646 | 1.6% | |
| 8 | 496 | 1.2% | |
| 9 | 296 | 0.7% | |
| 10 | 247 | 0.6% | |
| Other values (37) | 1077 | 2.6% |
| Value | Count | Frequency (%) | |
| 1 | 15817 | 38.9% | |
| 2 | 11218 | 27.6% | |
| 3 | 5003 | 12.3% | |
| 4 | 3161 | 7.8% | |
| 5 | 1589 | 3.9% |
| Value | Count | Frequency (%) | |
| 63 | 1 | < 0.1% | |
| 58 | 1 | < 0.1% | |
| 55 | 1 | < 0.1% | |
| 51 | 1 | < 0.1% | |
| 50 | 2 | < 0.1% |
pdays
Real number (ℝ)
| Distinct count | 548 |
|---|---|
| Unique (%) | 1.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 40.05986728926026 |
|---|---|
| Minimum | -1 |
| Maximum | 871 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 317.9 KiB |
Quantile statistics
| Minimum | -1 |
|---|---|
| 5-th percentile | -1 |
| Q1 | -1 |
| median | -1 |
| Q3 | -1 |
| 95-th percentile | 317 |
| Maximum | 871 |
| Range | 872 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 100.0782815 |
|---|---|
| Coefficient of variation (CV) | 2.498217998 |
| Kurtosis | 7.047275272 |
| Mean | 40.05986729 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 2.631138312 |
| Sum | 1630036 |
| Variance | 10015.66242 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| -1 | 33279 | 81.8% | |
| 182 | 153 | 0.4% | |
| 92 | 132 | 0.3% | |
| 183 | 112 | 0.3% | |
| 91 | 112 | 0.3% | |
| 181 | 99 | 0.2% | |
| 370 | 85 | 0.2% | |
| 184 | 74 | 0.2% | |
| 95 | 68 | 0.2% | |
| 364 | 66 | 0.2% | |
| Other values (538) | 6510 | 16.0% |
| Value | Count | Frequency (%) | |
| -1 | 33279 | 81.8% | |
| 1 | 15 | < 0.1% | |
| 2 | 32 | 0.1% | |
| 3 | 1 | < 0.1% | |
| 4 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 871 | 1 | < 0.1% | |
| 854 | 1 | < 0.1% | |
| 850 | 1 | < 0.1% | |
| 842 | 1 | < 0.1% | |
| 838 | 1 | < 0.1% |
| Distinct count | 41 |
|---|---|
| Unique (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.5794052592774638 |
|---|---|
| Minimum | 0 |
| Maximum | 275 |
| Zeros | 33279 |
| Zeros (%) | 81.8% |
| Memory size | 317.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 3 |
| Maximum | 275 |
| Range | 275 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 2.350663681 |
|---|---|
| Coefficient of variation (CV) | 4.057028553 |
| Kurtosis | 4615.65138 |
| Mean | 0.5794052593 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 43.46842768 |
| Sum | 23576 |
| Variance | 5.52561974 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 33279 | 81.8% | |
| 1 | 2490 | 6.1% | |
| 2 | 1886 | 4.6% | |
| 3 | 1032 | 2.5% | |
| 4 | 639 | 1.6% | |
| 5 | 408 | 1.0% | |
| 6 | 255 | 0.6% | |
| 7 | 190 | 0.5% | |
| 8 | 112 | 0.3% | |
| 9 | 81 | 0.2% | |
| Other values (31) | 318 | 0.8% |
| Value | Count | Frequency (%) | |
| 0 | 33279 | 81.8% | |
| 1 | 2490 | 6.1% | |
| 2 | 1886 | 4.6% | |
| 3 | 1032 | 2.5% | |
| 4 | 639 | 1.6% |
| Value | Count | Frequency (%) | |
| 275 | 1 | < 0.1% | |
| 58 | 1 | < 0.1% | |
| 55 | 1 | < 0.1% | |
| 51 | 1 | < 0.1% | |
| 41 | 1 | < 0.1% |
poutcome
Categorical
| Distinct count | 4 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 317.9 KiB |
| unknown | |
|---|---|
| failure | 4377 |
| other | 1648 |
| success | 1381 |
| Value | Count | Frequency (%) | |
| unknown | 33284 | 81.8% | |
| failure | 4377 | 10.8% | |
| other | 1648 | 4.1% | |
| success | 1381 | 3.4% |
Length
| Max length | 7 |
|---|---|
| Median length | 7 |
| Mean length | 6.918997297 |
| Min length | 5 |
y
Boolean
| Distinct count | 2 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 317.9 KiB |
| no | |
|---|---|
| yes | 4787 |
| Value | Count | Frequency (%) | |
| no | 35903 | 88.2% | |
| yes | 4787 | 11.8% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.First rows
| df_index | age | job | marital | education | default | balance | housing | loan | contact | day | month | duration | campaign | pdays | previous | poutcome | y | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 58 | management | married | tertiary | no | 2143 | yes | no | unknown | 5 | may | 261 | 1 | -1 | 0 | unknown | no |
| 1 | 1 | 44 | technician | single | secondary | no | 29 | yes | no | unknown | 5 | may | 151 | 1 | -1 | 0 | unknown | no |
| 2 | 2 | 33 | entrepreneur | married | secondary | no | 2 | yes | yes | unknown | 5 | may | 76 | 1 | -1 | 0 | unknown | no |
| 3 | 3 | 47 | blue-collar | married | unknown | no | 1506 | yes | no | unknown | 5 | may | 92 | 1 | -1 | 0 | unknown | no |
| 4 | 4 | 33 | unknown | single | unknown | no | 1 | no | no | unknown | 5 | may | 198 | 1 | -1 | 0 | unknown | no |
| 5 | 5 | 35 | management | married | tertiary | no | 231 | yes | no | unknown | 5 | may | 139 | 1 | -1 | 0 | unknown | no |
| 6 | 7 | 42 | entrepreneur | divorced | tertiary | yes | 2 | yes | no | unknown | 5 | may | 380 | 1 | -1 | 0 | unknown | no |
| 7 | 8 | 58 | retired | married | primary | no | 121 | yes | no | unknown | 5 | may | 50 | 1 | -1 | 0 | unknown | no |
| 8 | 10 | 41 | admin. | divorced | secondary | no | 270 | yes | no | unknown | 5 | may | 222 | 1 | -1 | 0 | unknown | no |
| 9 | 11 | 29 | admin. | single | secondary | no | 390 | yes | no | unknown | 5 | may | 137 | 1 | -1 | 0 | unknown | no |
Last rows
| df_index | age | job | marital | education | default | balance | housing | loan | contact | day | month | duration | campaign | pdays | previous | poutcome | y | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 40680 | 45200 | 38 | technician | married | secondary | no | 557 | yes | no | cellular | 16 | nov | 1556 | 4 | -1 | 0 | unknown | yes |
| 40681 | 45201 | 53 | management | married | tertiary | no | 583 | no | no | cellular | 17 | nov | 226 | 1 | 184 | 4 | success | yes |
| 40682 | 45202 | 34 | admin. | single | secondary | no | 557 | no | no | cellular | 17 | nov | 224 | 1 | -1 | 0 | unknown | yes |
| 40683 | 45203 | 23 | student | single | tertiary | no | 113 | no | no | cellular | 17 | nov | 266 | 1 | -1 | 0 | unknown | yes |
| 40684 | 45204 | 73 | retired | married | secondary | no | 2850 | no | no | cellular | 17 | nov | 300 | 1 | 40 | 8 | failure | yes |
| 40685 | 45205 | 25 | technician | single | secondary | no | 505 | no | yes | cellular | 17 | nov | 386 | 2 | -1 | 0 | unknown | yes |
| 40686 | 45206 | 51 | technician | married | tertiary | no | 825 | no | no | cellular | 17 | nov | 977 | 3 | -1 | 0 | unknown | yes |
| 40687 | 45207 | 71 | retired | divorced | primary | no | 1729 | no | no | cellular | 17 | nov | 456 | 2 | -1 | 0 | unknown | yes |
| 40688 | 45208 | 72 | retired | married | secondary | no | 5715 | no | no | cellular | 17 | nov | 1127 | 5 | 184 | 3 | success | yes |
| 40689 | 45210 | 37 | entrepreneur | married | secondary | no | 2971 | no | no | cellular | 17 | nov | 361 | 2 | 188 | 11 | other | no |